Question 1. After exploring the Manhattan plots, and qq-plots from each GWAS, what can you tell about the power of each study?

# modified scripts to allow different inputs: 
# source(here::here("networks/Manhattan_plot.R"))
system(paste("RScript",
             here::here("networks/Manhattan_plot.R"), 
             "-p networks/MS.pvals.out",
             "-o networks/ms_manhattan_plot.pdf"
             )
       )


system(paste("RScript",
             here::here("networks/Manhattan_plot.R"), 
             "-p networks/HT.pvals.out",
             "-o networks/ht_manhattan_plot.pdf"
             )
       )

# source(here::here("networks/qqplot.R"))
system(paste("RScript",
             here::here("networks/qqplot.R"), 
             "-p networks/MS.pvals.out",
             "-o networks/ms_qqplot.pdf"
             )
       )

system(paste("RScript",
             here::here("networks/qqplot.R"), 
             "-p networks/HT.pvals.out",
             "-o networks/ht_qqplot.pdf"
             )
       )

Plot: Manhattan Plot for MS GWAS

Plot: Manhattan Plot for HT GWAS

Plot: QQplot for MS GWAS

Plot QQplot for HT GWAS

The MS GWAS is more powerful than the HT GWAS. We can infer this different by noting that more variant associations are above the y=x line in the MS QQplot, and more variants above the horizontal line of significance threshold in the MS Manhattan plot.




Question 2. Using Cytoscape, analyze the PPI and describe its main network.


Plot: Full Parent PPI Network

Parent PPI full network


The above plot is of the full parent network. After running analysis in cytoscape, the resulting summary information and plots were produced:

Summary Statistics: Full Parent Network

Number of nodes: 8960
Number of edges: 27724
Avg. number of neighbors: 6.363
Network diameter: 13
Network radius: 7
Characteristic path length: 4.382
Clustering coefficient: 0.088
Network density: 0.001
Network heterogeneity: 2.063
Network centralization: 0.033
Connected components: 164
Analysis time (sec): 34.270


Plot: Parent PPI Network Degree Distribution Plot

Parent Degree Distribution

Plot: Parent PPI Betweeness by Degree Distribution

Parent Betweenness by Degree Distribution

Describe the main Parent PPI network

Most nodes in the main Parent PPI network have very few connections (low degree), as can be seen in the network degree plot. The ‘power-law’ shape of this plot also indicates that the network doesn’t follow a random distribution - but could follow a scale-free or hierarchical distribution.

The general upward trend of the betweeness by degree distribution indicates that a small number genes in this network are hubs, or common bridges linking other genes in the shortest path. This could perhaps indicate that the network is hierarchical, rather than scale-free.




Question 3. Using Cytoscape, find the first order networks (p<0.05) for each GWAS.

Plot: First order network for MS GWAS

MS Parent Network

Plot: First order network for HT GWAS

HT Parent Network




Question 4. Source “Pathway_permutation.r”. Are the first order networks from both GWAS more connected than expected? What does this mean?

#source(here::here("networks/Pathway_permutations.R"))
system(paste("RScript",
             here::here("networks/Pathway_permutation.R"), 
             "-p networks/parent_PPI.sif",
             "-o networks/q4_pathway_permutation.pdf"
             )
       )


The above plots indicate that:

Overall, these observations suggest that networks from GWAS are more connected that would be expected (although, this is much more true for MS than HT).




Question 5. Run BINGO App on all nodes from largest connected component. What biological processes emerge from the first order networks?

Plot: Network produced by taking the largest connected component of the full parent network and performing BINGO analysis

Largest component parent BINGO


Table: Top results from BINGO analysis of largest component of parent network

All big component BINGO results


Plot: Network produced by taking the largest connected component of first order network for MS and performing BINGO analysis

MS BINGO Network


Table: Top results from BINGO analysis of first order network for MS

MS BINGO Top Results

Looking at the top GO term enrichments by BINGO analysis (above) immune system related processes clearly emerge as important.




Question 6. Map and color known MS and HT genes onto their respective first order nets. Interpret results.

Plot: MS first order net, known MS genes colored in yellow

MS gene coloured on first order network

Plot: HT first order net, known HT genes colored in yellow

HT genes coloured on first order network

Interpret the results:

The above plots suggest that the MS GWAS identified many more genes already known to involved in its pathogenesis, than the HT GWAS did. In other words, the HT GWAS identified a greater proportion of genes that had yet to be characterised as contributing to this phenotype - and therefore this GWAS contributed a greater proportion of new knowledge.




Question 7. Repeat steps 3 and 4 with directed protein network from PNAS paper.

7a. Repeat Step 3: find the first order networks (p<0.05) for each GWAS.

Plot: First order MS network (subset of the directed network)

First order MS network

Plot: First order HT network (subset of the directed network)

First order HT network

7b. Repeat Step 4: Source “Pathway_permutation.r”. Are the first order networks from both GWAS more connected than expected? What does this mean?

#source(here::here("networks/Pathway_permutations.R"))
system(paste("RScript",
             here::here("networks/Pathway_permutation.R"), 
             "-p networks/Directed_PPI.sif",
             "-o networks/q7_pathway_permutation.pdf"
             )
       )

*Obvs some problem with plot …

The above plots indicate that:

  • HT first order networks have smaller largest connected components than would be expected given the number of edges (connections between genes) in these networks




Question 8. Color nodes by controllability category (dispensable, indispensable, neutral).

Plot: Full directed network coloured by controllability: blue (dispensable), red (indispensable), grey (neutral).

Full Directed Network Coloured

Plot: MS first order Network coloured by controllability: blue (dispensable), red (indispensable), grey (neutral).

MS first order network coloured

Plot: HT first order Network coloured by controllability: blue (dispensable), red (indispensable), grey (neutral).

HT first order network coloured




Question 9. Repeat step 6. Are MS-associated genes more enriched in any controllability category? Interpret.

9a. Repeat step 6: Map and color known MS and HT genes onto their respective first order nets. Interpret results.

Plot: MS first order net (from directed network), with known MS genes colored in red

MS first order network, color MS genes

Plot: HT first order network, with known HT genes colored red

HT first order network, color HT genes

9b. Are MS-associated genes more enriched in any controllability category? Interpret.

The null hypothesis to test is: The number of controllable genes in the MS-associated first order network is consistent with random sampling of controllable genes from the full directed network.

# total number of nodes in the MS associated gene first order network
# this is k in the hypergeometric parameters
k = 546

# is the total number of controllable nodes in the network
# i.e. dispensible + indispensible
m = 3677 - 8 # is the number of unlabelled nodes

# n is the total number of nodes in the directed network
n = 6338- m

# then the value to test for is the number of controllable
# ms associated genes 
q = 317

phyper(k= k, 
       lower.tail = F, 
       m = m, 
       n = n, 
       q = q)
## [1] 0.4493194

This p-value suggests that MS-associated genes are not enriched for controllable genes. This is unsurprising given the proportion of controllable genes out of the MS-associated genes (317/549 \(\approx\) 58%) is very similar to the proportion of controllable gene in the entire directed network (3667/6338).